Large Language Models

# Large Language Models

PaSa

PaSa is an advanced academic paper search agent developed by ByteDance, based on large language model (LLM) technology. It can autonomously invoke search tools, read papers, and filter relevant references to obtain comprehensive and accurate results for complex academic queries. This technology is optimized through reinforcement learning, trained using the synthetic dataset AutoScholarQuery, and has shown outstanding performance on the real-world query dataset RealScholarQuery, significantly outperforming traditional search engines and GPT-based methods. The main advantages of PaSa lie in its high recall and precision rates, providing researchers with a more efficient academic search experience.

self-adaptive-llms

Self Adaptive Llms

SakanaAI/self-adaptive-llms is an adaptive framework called Transformer2, designed to address the challenges of traditional fine-tuning methods, which are computationally intensive and have static capabilities in handling diverse tasks. This framework adjusts large language models (LLMs) in real time during inference using a two-step mechanism: first, a scheduling system identifies task attributes; then, task-specific 'expert' vectors trained via reinforcement learning are dynamically mixed to achieve target behavior for the input prompt. Key advantages include real-time task adaptability, computational efficiency, and flexibility. Developed by the SakanaAI team, this project is open-source on GitHub, currently boasting 195 stars and 12 forks.

Sonus-1

Sonus-1 is a series of large language models (LLMs) launched by Sonus AI, designed to push the boundaries of artificial intelligence. These models are engineered for high performance and versatility across various applications, including versions such as Sonus-1 Mini, Sonus-1 Air, Sonus-1 Pro, and Sonus-1 Pro (with Reasoning) to cater to different needs. The Sonus-1 Pro (with Reasoning) has excelled in multiple benchmarks, especially in reasoning and mathematical tasks, demonstrating its capability to surpass other proprietary models. Sonus AI is committed to developing high-performance, affordable, reliable, and privacy-focused large language models.

FlagEval

FlagEval is a model evaluation platform focused on assessing large language models and multimodal models. It provides a fair and transparent environment for comparing different models under the same standards, helping researchers and developers understand model performance and advancing artificial intelligence technology. The platform covers various model types, including conversational models and visual-language models, supports the evaluation of both open-source and closed-source models, and offers specialized evaluations like K12 subject assessments and financial quantitative trading evaluations.

CosyVoice 2

CosyVoice 2 is a voice synthesis model developed by Alibaba Group's SpeechLab@Tongyi team. It is based on supervised discrete speech labels and combines two popular generative models: language models (LMs) and flow matching, achieving high naturalness, content consistency, and speaker similarity in voice synthesis. This model plays a significant role in multimodal large language models (LLMs), particularly in interactive experiences where response latency and real-time factors are crucial for speech synthesis. CosyVoice 2 enhances the utilization of speech label codebooks through limited scalar quantization, simplifies the text-to-speech language model architecture, and designs a block-aware causal flow matching model to adapt to various synthesis scenarios. It has been trained on large-scale multilingual datasets, achieving human-equivalent synthesis quality with extremely low response latency and real-time performance.

Command R7B

Command R7B is a high-performance, scalable large language model (LLM) introduced by Cohere, specifically designed for enterprise applications. It delivers top-tier speed, efficiency, and quality while maintaining a compact model size, significantly lowering the production deployment costs of AI applications on standard GPUs, edge devices, or even CPUs. Command R7B excels in multilingual support, retrieval-augmented generation (RAG), reasoning, tool usage, and agent behavior, making it ideal for enterprises focusing on optimizing speed, cost efficiency, and computational resources.

MLPerf Client

MLPerf Client is a newly developed benchmark created in collaboration with MLCommons, aimed at evaluating the performance of large language models (LLMs) and other AI workloads on personal computers (from laptops to desktops to workstations). This benchmark simulates real-world AI tasks to provide clear metrics on how systems handle generative AI workloads. The MLPerf Client working group hopes this benchmark will drive innovation and competition, ensuring that personal computers can meet the challenges of an AI-driven future.

Model Training and Deployment

InternVL2_5-38B

Internvl2 5 38B

InternVL 2.5 is a series of multimodal large language models launched by OpenGVLab, featuring significant enhancements in training strategies, testing strategies, and data quality improvements over InternVL 2.0. This series can process image, text, and video data, demonstrating capabilities in multimodal understanding and generation, positioning it at the forefront of the multimodal AI field. The InternVL 2.5 series provides robust support for multimodal tasks with its high performance and open-source attributes.

Sandbox Fusion

Sandbox Fusion is a multifunctional code sandbox specifically designed for large language models (LLMs). It supports up to 20 programming languages and can comprehensively test multiple domains, including programming, mathematics, and hardware programming. Sandbox Fusion integrates over 10 coding-related assessment datasets, which feature standardized data formats and are accessible via a unified HTTP API. Additionally, Sandbox Fusion is optimized for cloud infrastructure deployment and offers built-in security isolation when privileged containers are available. Developed by ByteDance, Sandbox Fusion aims to provide developers with a secure and efficient code testing environment.

Development & Tools

Star-Attention is a novel block-sparse attention mechanism proposed by NVIDIA aimed at improving the inference efficiency of large language models (LLMs) based on Transformers for long sequences. This technology significantly boosts inference speed through a two-stage operation while maintaining an accuracy rate of 95-100%. It is compatible with most Transformer-based LLMs, allowing for direct use without additional training or fine-tuning, and can be combined with other optimization methods such as Flash Attention and KV cache compression techniques to further enhance performance.

Model Training and Deployment

Model Context Protocol Servers

Model Context Protocol Servers

Model Context Protocol Servers is a project that showcases the versatility and scalability of the Model Context Protocol (MCP). It provides a set of reference implementations and community-contributed servers that demonstrate how to use MCP to provide secure, controlled access to tools and data sources for large language models (LLMs). Each MCP server is implemented using the TypeScript MCP SDK or Python MCP SDK. Managed by Anthropic and built with the community, this project is open source and encourages contributions of servers and improvements.

Large Language Models

WorkflowLLM

WorkflowLLM is a data-centric framework designed to enhance the orchestration capabilities of large language models (LLMs). At its core is WorkflowBench, a large-scale supervised fine-tuning dataset containing 106,763 samples from 1,503 APIs across 83 applications and 28 categories. WorkflowLLM fine-tunes the Llama-3.1-8B model to create the WorkflowLlama model optimized specifically for workflow orchestration tasks. Experimental results indicate that WorkflowLlama excels in orchestrating complex workflows and generalizes well to unseen APIs.

Workflow Orchestration

Agora

Agora is a simple cross-platform protocol that allows heterogeneous large language models (LLMs) to communicate effectively with each other through negotiation. The protocol facilitates rare communication in natural language while negotiating a structured data communication protocol (e.g., JSON) for frequent interactions. Once the protocol is established, LLMs will utilize routines—simple scripts (e.g., Python)—for sending or receiving data. Future communications will leverage these routines, reducing dependency on LLMs and enhancing efficiency, versatility, and portability.

Development & Tools

5ire

5ire is an AI product centered on simplicity and user-friendliness, designed to enable even beginners to easily harness large language models. It supports the parsing and vectorization of various document formats and includes features such as a local knowledge base, usage analytics, a prompt library, bookmarks, and quick keyword search. As an open-source project, 5ire is available for free download and also offers a pay-as-you-go API service for large language models.

Knowledge Management

O1-Journey

O1-Journey is a project initiated by the GAIR research group at Shanghai Jiao Tong University, aimed at replicating and reimagining the capabilities of OpenAI's O1 model. This project introduces a novel training paradigm called 'journey learning' and has successfully built the first model that integrates search and learning in mathematical reasoning. Through processes such as trial and error, correction, backtracking, and reflection, this model has become an effective method for tackling complex reasoning tasks.

Research Equipment

URL Parser Online

URL Parser Online

URL Parser Online is an online tool that transforms complex URLs into input formats compatible with large language models (LLMs). The significance of this technology lies in its ability to assist developers and researchers in more effectively handling and parsing URL data, particularly in web content analysis and data extraction tasks. Background information indicates a growing demand for parsing and processing URLs due to the explosive increase in internet data. URL Parser Online provides a convenient solution with its straightforward user interface and efficient parsing capabilities. The service is currently offered for free, targeting developers and data analysts.

Development & Tools

SELA

SELA is an innovative system that enhances automated machine learning (AutoML) by integrating Monte Carlo Tree Search (MCTS) with LLM-based agents. Traditional AutoML methods often produce low-diversity and suboptimal code, limiting their effectiveness in model selection and integration. SELA represents pipeline configurations as trees, enabling agents to intelligently explore the solution space and iteratively refine strategies based on experimental feedback.

Model Training and Deployment

LongVU

LongVU is an innovative long video language understanding model that reduces the number of video annotations through a spatiotemporal adaptive compression mechanism while preserving visual details in lengthy videos. The importance of this technology lies in its ability to handle a large number of video frames while losing only a minimal amount of visual information within a limited context length, significantly enhancing long video content understanding and analysis capabilities. LongVU surpasses existing methods in various video understanding benchmark tests, particularly for tasks involving videos up to one hour long. Furthermore, LongVU can effectively scale down to smaller model sizes while maintaining state-of-the-art video understanding performance.

Model Training and Deployment

FakeShield

FakeShield is a multimodal framework designed to address two primary challenges in the field of Image Forensics Detection and Localization (IFDL): the black-box nature of detection mechanisms and the limited generalization across different tampering methods. By leveraging GPT-4o to enhance existing IFDL datasets, FakeShield has created a Multimodal Tampering Description Dataset (MMTD-Set) to train its tampering analysis capabilities. The framework includes domain label-guided interpretable detection modules (DTE-FDM) and localization modules (MFLM) that can interpret various types of tampering detection and guide localization through detailed textual descriptions. FakeShield outperforms other methods in detection accuracy and F1 scores, providing a superior and interpretable solution.

BitNet

BitNet is an official inference framework developed by Microsoft, designed specifically for 1-bit large language models (LLMs). It provides a set of optimized core features that support fast and lossless 1.58-bit model inference on CPUs (with NPU and GPU support coming soon). BitNet achieves speedups ranging from 1.37x to 5.07x on ARM CPUs, with energy efficiency gains of 55.4% to 70.0%. On x86 CPUs, speed improvements range from 2.37x to 6.17x, and the energy efficiency ratio increases from 71.9% to 82.2%. Additionally, BitNet can run the 100B parameter BitNet b1.58 model on a single CPU, achieving inference speeds close to human reading rates, thus expanding the possibilities of running large language models on local devices.

Model Training and Deployment

awesome-LLM-resources

Awesome LLM Resources

awesome-LLM-resources is a platform that aggregates global resources for large language models (LLMs), offering a range of tools and resources from data acquisition and fine-tuning to inference, evaluation, and real-world applications. Its significance lies in providing researchers and developers with a comprehensive resource library to facilitate the efficient development and optimization of their language models. Maintained by Wang Rongsheng, the platform is continuously updated, providing robust support for the advancement of the LLM field.

AI tools website directory

VirtualWife

VirtualWife is a virtual digital human project aimed at creating a virtual partner with its own 'soul.' The project supports live streaming on Bilibili and is compatible with large language models like OpenAI and Ollama. VirtualWife can provide emotional companionship and serve as a relationship mentor and mental health consultant, fulfilling human emotional needs. The project is currently in the incubation stage, and the author has devoted significant personal time to development, hoping users can support its growth by giving it a star.

AI virtual girlfriend

MM1.5

MM1.5 is a series of multimodal large language models (MLLMs) designed to enhance capabilities in understanding text-rich images, visual reference grounding, and multi-image reasoning. Based on the MM1 architecture, the model adopts a data-centric training approach and systematically explores the impact of different data mixes throughout the model training lifecycle. The MM1.5 model varies from 1B to 30B parameters and includes both dense and mixture of experts (MoE) variants, providing valuable guidance for future MLLM development research through extensive empirical and ablation studies that detail the training processes and decision insights.

AutoDAN-Turbo

AutoDAN-Turbo is an automated framework that operates without human intervention, designed to discover and implement various strategies to circumvent the limitations of large language models (LLMs). The framework can automatically develop diverse attack strategies, significantly increasing the success rate of attacks, and integrates existing human-designed jailbreak strategies into a unified framework. Its significance lies in enhancing the security and reliability of LLMs in adversarial environments, offering a new automated approach for red team assessment tools.

Lumigator

Developed by Mozilla.ai, Lumigator is a product that assists developers in choosing the most appropriate large language model (LLM) for their specific projects. It evaluates models using task-specific metrics, ensuring that the chosen models meet project requirements. Lumigator aims to become an open-source platform that promotes ethical and transparent AI development while addressing gaps in the industry toolchain.

AI Development Aids

Tilores Identity RAG

Tilores Identity RAG

Tilores Identity RAG is a platform providing customer data search, unification, and retrieval services for large language models (LLMs). It uses real-time fuzzy search technology to handle spelling errors and inaccurate information, delivering accurate, relevant, and unified customer data responses. The platform addresses challenges faced by large language models when retrieving structured customer data, such as data being spread across various sources, difficulties in finding customer data due to incomplete matching of search terms, and the complexities involved in unifying customer records. It allows for quick retrieval of structured customer data, the construction of dynamic customer profiles, and provides real-time, unified, and accurate customer data during queries.

AI development assistant

Mishi AI Community

Mishi AI Community

The Mishi AI Community focuses on the intersection of artificial intelligence and product management, providing a comprehensive knowledge system and development use cases related to AI product management. Community members have the opportunity to become 'super individuals and one-person companies.' You can contact the community leaders via email or social media to join the AI PM community.

AI information platform

RD-Agent

RD-Agent is an automated research and development tool launched by Microsoft Research Asia, leveraging the powerful capabilities of large language models to create a new model for AI-driven R&D process automation. By integrating data-driven R&D systems, it harnesses AI capabilities to drive the automation of innovation and development, significantly improving R&D efficiency. With an intelligent decision-making and feedback mechanism, it offers limitless possibilities for future cross-disciplinary innovation and knowledge transfer.

AI Development Aids

NVLM 1.0

NVLM 1.0 is a series of advanced multimodal large language models (LLMs) that have achieved state-of-the-art results on visual-language tasks, comparable to leading proprietary and open-access models. Notably, NVLM 1.0 surpasses its LLM backbone model in text performance following multimodal training. We have made the model weights and code open-source for the community.

OneGen

OneGen is an efficient single-pass generation and retrieval framework designed for large language models (LLMs), intended for fine-tuning generation, retrieval, or mixed tasks. The core idea is to integrate generation and retrieval tasks within the same context by assigning the retrieval task to retrieval tokens generated autoregressively. This enables the LLM to perform both tasks in a single forward pass. This approach not only reduces deployment costs but also significantly decreases inference costs, as it avoids the need for two forward pass computations for queries.

Featured AI Tools

Flow AI

Flow is an AI-driven movie-making tool designed for creators, utilizing Google DeepMind's advanced models to allow users to easily create excellent movie clips, scenes, and stories. The tool provides a seamless creative experience, supporting user-defined assets or generating content within Flow. In terms of pricing, the Google AI Pro and Google AI Ultra plans offer different functionalities suitable for various user needs.

Video Production

NoCode

NoCode is a platform that requires no programming experience, allowing users to quickly generate applications by describing their ideas in natural language, aiming to lower development barriers so more people can realize their ideas. The platform provides real-time previews and one-click deployment features, making it very suitable for non-technical users to turn their ideas into reality.

Development Platform

ListenHub

ListenHub is a lightweight AI podcast generation tool that supports both Chinese and English. Based on cutting-edge AI technology, it can quickly generate podcast content of interest to users. Its main advantages include natural dialogue and ultra-realistic voice effects, allowing users to enjoy high-quality auditory experiences anytime and anywhere. ListenHub not only improves the speed of content generation but also offers compatibility with mobile devices, making it convenient for users to use in different settings. The product is positioned as an efficient information acquisition tool, suitable for the needs of a wide range of listeners.

MiniMax Agent

MiniMax Agent is an intelligent AI companion that adopts the latest multimodal technology. The MCP multi-agent collaboration enables AI teams to efficiently solve complex problems. It provides features such as instant answers, visual analysis, and voice interaction, which can increase productivity by 10 times.

Multimodal technology

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0 is Tencent's latest released AI image generation model, significantly improving generation speed and image quality. With a super-high compression ratio codec and new diffusion architecture, image generation speed can reach milliseconds, avoiding the waiting time of traditional generation. At the same time, the model improves the realism and detail representation of images through the combination of reinforcement learning algorithms and human aesthetic knowledge, suitable for professional users such as designers and creators.

Image Generation

OpenMemory MCP

OpenMemory is an open-source personal memory layer that provides private, portable memory management for large language models (LLMs). It ensures users have full control over their data, maintaining its security when building AI applications. This project supports Docker, Python, and Node.js, making it suitable for developers seeking personalized AI experiences. OpenMemory is particularly suited for users who wish to use AI without revealing personal information.

FastVLM

FastVLM is an efficient visual encoding model designed specifically for visual language models. It uses the innovative FastViTHD hybrid visual encoder to reduce the time required for encoding high-resolution images and the number of output tokens, resulting in excellent performance in both speed and accuracy. FastVLM is primarily positioned to provide developers with powerful visual language processing capabilities, applicable to various scenarios, particularly performing excellently on mobile devices that require rapid response.

Image Processing

LiblibAI

LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase